Skip to content

Prototype: support eager analysis inside Pipelines query functions#53700

Closed
sryza wants to merge 7 commits intoapache:masterfrom
sryza:analyze-in-query-function
Closed

Prototype: support eager analysis inside Pipelines query functions#53700
sryza wants to merge 7 commits intoapache:masterfrom
sryza:analyze-in-query-function

Conversation

@sryza
Copy link
Copy Markdown
Contributor

@sryza sryza commented Jan 6, 2026

What changes were proposed in this pull request?

This is a WIP change that adds support for using functions like DataFrame.schema and DataFrame.columns inside pipeline query functions.

The change makes graph resolution partially asynchronous.

Many of the data structures that were previously maintained as local variables inside transformDownNodes have been moved to a GraphAnalysisContext object. Moving them into a separate object makes them accessible from Spark Connect RPC handlers that:

  • Register query function results
  • Poll for query functions to execute
  • Analyze within the context of the graph

Were also essentially introducing a new state that flows can be in during resolution, which is “waiting for query function result”.

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jan 6, 2026

⚠️ Pull Request Title Validation

This pull request title does not contain a JIRA issue ID.

Please update the title to either:

  • Include a JIRA ID: [SPARK-12345] Your description
  • Mark as minor change: [MINOR] Your description

For minor changes that don't require a JIRA ticket (e.g., typo fixes), please prefix the title with [MINOR].


This comment was automatically generated by GitHub Actions

@sryza sryza changed the title Analyze in query function Prototype: support eager analysis inside Pipelines query functions Jan 6, 2026
@github-actions
Copy link
Copy Markdown

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions Bot added the Stale label Apr 23, 2026
@github-actions github-actions Bot closed this Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant